Towards a self-review question-generator for Wikipedia articles

Andrew K F Lui and Ng Sin-chun
Open University of Hong Kong
Hong Kong SAR, China

Li Siu Cheung
Hong Kong Baptist University
Hong Kong SAR, China


Wikipedia has become a major resource for teaching and learning. Although there is still some scepticism about its reliability and scholarly standard, many schools and universities have incorporated Wikipedia articles into their syllabuses. With an increased effort to manage the reliability of Wikipedia articles and proper reviewing by teachers, there is little reason to reject this free, timely and abundant source of learning content. When a student is asked to read an article, it is a good practice to provide self-review questions to guide the focus of reading. Such self-review questions assess knowledge of the major concepts and their relations in the text. The project outlined in this paper aims to develop a self-review question-generator for Wikipedia articles. The system exploits the style and structural uniformity of Wikipedia articles and applies natural language processing techniques to electronic text to identify the key concepts and create relevant questions. Part of the system is a wrapper interface that supports the viewing of an article and the display of dynamically generated exercises at the same time.

Questioning and answering is often central to the learning process, with good questions motivating students to find answers. As designing questions for students is an intellectually challenging task, it is clearly very difficult to automate this process. A good question in a teaching-learning context should be short, clear, at the right level and phrased in an appropriate manner. There are many types of questions, most often categorized according to their purpose or intention. Of these, the closed type, with specific answers, is the easiest to be fully automated: with a known answer and its context, one can design an algorithm to formulate a simple question. Several systems have been proposed for the generation of cloze tests for the assessment of context and vocabulary abilities. These systems operate by identifying a vocabulary item in a sentence and removing the word for students to guess it from the sentence context. The key is to find vocabulary at the appropriate level of challenge for the students.

This paper describes Wikaquest, a system that generates self-review questions for Wikipedia articles. Many of these questions are of the closed type, asking for specific conceptual knowledge explicitly quoted in the text. Wikaquest operates in the following manner:

Basically, the operation involves syntactic analysis to identify the concepts and their relations, and statistical analysis to estimate the importance of the concepts for selecting the appropriate ones for question construction.

The paper describes the operation of Wikaquest in more detail. It also analyses the problem of automatic question-generation, reviews relevant existing systems and outlines the prototype implementation of Wikaquest and its evaluation.